Skip to content

Add regression test for byte-pool bug breaking ensure_capacity() #77

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 12, 2023

Conversation

link2xt
Copy link
Contributor

@link2xt link2xt commented Apr 5, 2023

I have investigated further into the original reason for #66 bug. While the bug was "fixed" by #74, it was still unclear to me why closed variable was set when the inner stream was never closed in the test. This is a follow-up to fix #78.

I have found that this was caused by a call to poll_read with an zero-size buffer. In this case zero bytes are read, which is treated as EOF.

This ensure_capacity() call is supposed to ensure at least 1 byte of capacity even if decode_needs is zero:
https://github.com/async-email/async-imap/blob/1963cedefc9828ce994eb63e01afcf5c1c65d4c9/src/imap_stream.rs#L288
It is documented that ensure_capacity() first of all "ensure[s] that buffer has free capacity", but due to a bug in byte-pool 0.2.2 it sometimes does not allocate more capacity even if the buffer is full.

The bugfix for byte-pool is at async-email/byte-pool#5 cc @dignifiedquire
Once fixed byte-pool is released, we can update to it to finalize this PR.
Alternatively, byte-pool can be replaced with bytes or we can require that the stream does its own buffering (#69), but I want to make a minimal fix first.

@flub
Copy link
Contributor

flub commented Apr 6, 2023

Alternatively, byte-pool can be replaced with bytes or we can require that the stream does its own buffering (#69), but I want to make a minimal fix first.

byte-pool and it's usage in async-imap are not ideal currently, from memory:

  • byte-pool wanted to never have a Vec with capacity != len since that's wasteful, but at the time of writing this was not yet possible with the rust std api. This is probably the source of the bug you just found since it was assuming capacity at that point in the code. IIRC you can actually write byte-pool the way it was intended now and not waste capacity. But no one has implemented this.
  • when using byte-pool's resize you change the block size in place. This then goes into the different free list once freed. but async-imap never takes blocks from the larger free list since it always starts with the smallest block. effectively leaking the larger blocks.
  • using AsyncBufRead is probably a fairly simple and safe solution, but will result in more allocations which is what byte-pool was trying to solve. though perhaps in reality this doesn't matter that much for normal amounts of data and correctness is always better...
  • using bytes could be nice, it's a well used library, but not obvious how you'd use it to save allocations. You'd still need some kind of free list(s) implementation i think?

@link2xt
Copy link
Contributor Author

link2xt commented Apr 6, 2023

byte-pool wanted to never have a Vec with capacity != len since that's wasteful, but at the time of writing this was not yet possible with the rust std api. This is probably the source of the bug you just found since it was assuming capacity at that point in the code. IIRC you can actually write byte-pool the way it was intended now and not waste capacity. But no one has implemented this.

Then I will also add a call to reserve_exact prior to resizing.

EDIT: done as a second commit for async-email/byte-pool#5

@link2xt
Copy link
Contributor Author

link2xt commented Apr 6, 2023

  • using AsyncBufRead is probably a fairly simple and safe solution, but will result in more allocations which is what byte-pool was trying to solve. though perhaps in reality this doesn't matter that much for normal amounts of data and correctness is always better...
  • using bytes could be nice, it's a well used library, but not obvious how you'd use it to save allocations. You'd still need some kind of free list(s) implementation i think?

What does saving allocaitions gets us? If it does not show up in memory profiler as deallocation and reallocation, it does not mean it is more performant. Allocator is just some library code, maybe a bit more expensive than looking up into a freelist, but I am not even sure about this as allocators have their own freelists sorted into bins afaik and are otherwise highly optimized. If they take caches/locality into account, may be using built-in allocator is better than a naive freelist implementation which does not care about fragmentation.

@link2xt
Copy link
Contributor Author

link2xt commented Apr 6, 2023

when using byte-pool's resize you change the block size in place. This then goes into the different free list once freed. but async-imap never takes blocks from the larger free list since it always starts with the smallest block. effectively leaking the larger blocks.

This memory leak issue I have not looked into detail yet, we need to open an issue here on in byte-pool repo for this depending on whether we consider resizing a misuse of byte-pool, or consider it to be a bug in byte-pool that it does not reuse resized blocks.

@flub
Copy link
Contributor

flub commented Apr 6, 2023

when using byte-pool's resize you change the block size in place. This then goes into the different free list once freed. but async-imap never takes blocks from the larger free list since it always starts with the smallest block. effectively leaking the larger blocks.

This memory leak issue I have not looked into detail yet, we need to open an issue here on in byte-pool repo for this depending on whether we consider resizing a misuse of byte-pool, or consider it to be a bug in byte-pool that it does not reuse resized blocks.

It's a difficult call. Should byte-pool take blocks from the larger free-list and move all current content? Or should it resize the existing block using the actual allocator and document this as intended behaviour?

Without a profiler I'm not really tempted to make that call, and am more likely to agree with your other statement that the global allocator is probably pretty good and we need profiling information to show we need to do better.

@dignifiedquire claims that was the case, but I never profiled that. I could dust off my old half-completed imapsyncer tool to pull my huge email account from the server and profile with or without byte-pool. That's how I found and fixed bugs here before.

@link2xt link2xt force-pushed the link2xt/ensure-capacity-bug branch from 00bb9b0 to abfbd51 Compare April 8, 2023 21:15
@link2xt link2xt force-pushed the link2xt/ensure-capacity-bug branch 2 times, most recently from d21bcad to 9f2b3de Compare April 12, 2023 10:58
@link2xt link2xt marked this pull request as ready for review April 12, 2023 11:41
@link2xt link2xt force-pushed the link2xt/ensure-capacity-bug branch from 9f2b3de to 0bb515b Compare April 12, 2023 11:45
@link2xt link2xt merged commit 0bb515b into master Apr 12, 2023
@link2xt link2xt deleted the link2xt/ensure-capacity-bug branch April 12, 2023 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ensure_capacity() does not always ensure at least one byte of capacity
2 participants